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(54) Title: A METHOD OF PROCESSING AN IMAGE 



(57) Abstract 

A method of processing an image comprising the 
steps of: locating within the image the position of at 
least one predetermined feature; extracting from said 
image data representing each said feature; and calculat- 
ing for each feature a feature vector representing the po- 
sition of the image data of the feature in an N-dimen- 
sional space, said space being deflned by a plurality of 
reference vectors each of which is an eigenvector of a 
training set of like features in which the image data of 
each feature is modified to normalise the shape of each 
feature thereby to reduce its deviation from a predeter- 
mined standard shape of said feature, which step is car- 
ried out before calculating the corresponding feature 
vertor. 
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This invention relates to a method of processing an 
image and particularly, but not exclusively, to the use of 
such a method in the recognition and encoding of images of 
5 objects such as faces. 

One field of object recognition which is potentially 
useful is in automatic identity verification techniques for 
restricted access buildings or fund transfer security, for 
example in the manner discussed in our UK application 
10 GB9005190. 5. In many such fund transfer transactions a user 
carries a card which includes machine-readable data stored 
magnetically, electrically or optically. One particular 
application of face recognition is to prevent the use of such 
cards by unauthorised personnel by storing face identifying 
15 data of the correct user on the card, reading the data out, 
obtaining a facial image of the person seeking to use the card 
by means of a caiftera, analyzing the image, and comparing the 
results of the analysis with the data stored on the card for 
the correct user. 
20 The storage capacity of such cards is typically only a 

few hundred bytes which is very much smaller than the memory 
space needed to store a recognisable image as a frame of 
pixels. It is therefore necessary to use an image processing 
technique which allows the image to be characterised using the 
25 smaller number of memory bytes. 

Another application of image processing which reduces 
the number of bytes needed to characterise an image is in 
hybrid video coding techniques for video telephones as 
disclosed in our earlier filed application published as US 
. 30 patent 4841575. In this and similar applications -the 
perceptually important parts of the image are located and the 
available coding data is preferentially allocated to those 

parts. 

A known method of such processing of an image comprises 
35 the steps of: locating within the image the position of at 
least one predetermined feature; extracting image data from 
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We have found that recognition accuracy of images of 
faces, for example, can be improved greatly by such a 
modifying step which reduces the effects of a persons' s 
changing facial expression. 
5 In the case of an image of a face, for example, the 

C predetermined feature could be the entire face or a part of it 
such as the„jifiss_jDj:.,mouth. Several predetermined features may 
be located and characterised as vectors in the corresponding 
space of eigenvectors if desired. ^ 
10 It will clear to those skilled in this field that the 

present invention is applicable to processing images of 
objects other than the faces of humans notwithstanding that 
the primary application envisaged by the applicant is in the 
field of human face images and that the discussion and 
15 specific examples of embodiments of the invention are directed 
to such images. 

The invention also enables the use of fewer eigen- 
pictures, and hence results in a saving of storage or of 
transmission capacity. 
20 Further, by modifying the shape of the feature towards 

a standard ( topologically equivalent) feature shape, the 
accuracy with which the feature can be located is improved. 

Preferably the training set of of images of like 
features are modified to normalise the shape of each of the 
25 training set of images thereby to reduce their deviation from 
a predetermined standard shape of said feature, which step is 
carried out before calculating the eigenvectors of the 
training set of images. 

The method is useful not only for object recognition, 
30 but also as a hybrid coding technique in which feature 
position data and feature representative data {the 
N-dimensional vector) are transmitted to a receiver where an 
image is assembled by combining the ei gen-pictures 
corresponding to the image vector. 
35 Eigen-pictures provide a means by which the variation 

in a set of related images can be extracted and used • ■> 
represent those images and others like them. For instance, *in 
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eye image could be economically represented in terms of ' best' 
coordinate system ' eigen-eyes' . 

The eigen-pictures themselves, are determined from a 
training set of representative images, and are formed such 
5 that the first eigen-picture embodies the maximum variance 
between the images, and successive eigen-pictures have, 
moriotonically decreasing variance. An image in the set can 
then be expressed as a series, with the eigen-pictures 
effectively forming basis functions: 
10 I = M + WiPj + W2P2 * ... * w^P^ 

where M = mean over entire training set of images 

Wj = component of the i' th eigen-picture 

Pj = i' th eigen-picture, of m, 

I = original image 
15 If we truncate the above series we still have the best 

representation we* could for the given number of eigen- 
pictures, in a mean-square-error sense. 

The basis J of eigen-pictures is chosen such that they 
point in the directions of maximum variance, subject to being 
20 orthogonal. In other words, each training image is considered 
as a point in n-dimensional space, where ' n' is the size of 
the training images in pels; eigen-picture vectors are then 
chosen to lie on lines of maximum variance through the 
cluster (s) produced. 
25 Given training images Ij, I^,, we first form the 

mean image M, and then the difference images (a. k. a. 
' caricatures' ) Dj=I|-M. 

The first paragraph (above) is equivalent to choosing 
our eigen-picture vectors such that 

Xj^ = (— ) J2 i^k^f is maximised 
m J 

with P^^P^^Q, i<k 

30 

The eigen-pictures Pj. above are in fact the eigenvectors 
of a very large covariance matrix, the solution of which would 
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be intractable. However, the problem can be reduced to more 
manageable proportions by forming the matrix h where 

and solving for the eigenvectors Vj^ of L. 
g The eigen-pictures can then be found by 

J 

Si 

The term ' representation vector' has been used to refer to the 

vector whose components (wj) are the factors applied to each 

eigen-picture (Pj) in the series. That is 

10 Q=(wi, Wj, . . ..Wn,)"^ 

The representation vector equivalent to an image I is 

formed by taking the inner product of I' s caricature with each 

eigen-picture: 
T 

W|=(I-M) Pj, for l<+i<m. 

15 Note that a certain assumption is made when it comes to 

representing an image taken from outside the training set used 
to create eigen-pictures; the image is assumed to be 
sufficiently ' similar' to those in training set to enable it 
to be well represented by the same eigen-pictures. 

20 The representation of two images can be compared by 

calculating the Euclidean distance between them: 

d,j = /0,-Qj/. 

Thus, recognition can be achieved via a simple 
threshold, d|j< T means recognised, or dy can be used as a 
25 sliding confidence scale. 

Deformable templates consist of parametrically defined 
geometric templates which interact with images to provide a 
best fit of the template to a corresponding image feature. 
For example, a template for an eye might consist of a circle 
30 for the iris and two parabolas for the eye/eyelid boundaries, 
where size, shape and location parameters are variable. 

An energy function is formed by integrating certain 
image attributes over template boundaries, and parameters are 
iteratively updated in an attempt to minimise this function. 



- o - 

e effect of moving the template TOwards the best 
available fit in the given image. 
/'^ The location within the image of the position of at 

least one predetermined feature may employ a first technique 

' 5 to provide a coarse estimation of position and a second, 

^ different, technique to improve upon the coarse estimation. 
The second technique preferably involves the use of such a 
deformable template technique. 

The deformable template technique requires certain 
10 filtered images in addition to the raw image itself, notably 
peak, valley and edge images. Suitable processed images can 
be obtained using morphological filters, and it is this stage 
which is detailed below. 

Morphological filters are able to provide a wide range 

15 of filtering functions including nonlinear image filtering, 
noise suppression, edge detection, skeletonization, shape 
recognition etc. All of these functions are provided via 
simple combinations of two basic operations termed erosion and 
dilation. In our case we are only interested in valley, peak 

20 and edge detectipn. 

Erosion of greyscale images effectively causes bright 
areas to shrink in upon themselves, whereas dilation causes 
bright areas to expand. An erosion followed by a dilation 
causes bright peaks to be lost (operator called ' open' . 

25 Conversely, a dilation followed by an erosion causes dark 
valleys to be filled (operator called 'close' ). For specific 
details see Maragos P, (1987), "Tutorial on Advances in 
Morphological Image Processing and Analysis", Optical 
Engineering. Vol 26. No. 7. 

30 In image processing systems of the kind to which the 

present invention relates, it is often necessary to locate the 
object, eg head or face, within the image prior to processing. 

Usually this is achieved by edge detection, but 
traditional edge detection techniques are purely local - an 

35 edge is indicated whenever a gradient of image intensity 
occurs - and hence will not in general form an edge that is 
completely closed (ie. forms a loop around the head) but will 
instead' create a number of edge segments which together 
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outline or partly outline the head. Post-processing of some 
kind is thus usually necessary. 

We have found that the adaptive contour model, or 
"snake", technique is particularly effective for this purpose. 
5 Preferably, the predetermined feature of the image is located 
by determining parameters of a closed curve arranged to lie- 
adjacent a plurality of edge features of the image, said curve 
being constrained to exceed a minimum curvature and to have a 
minimum length compatible therewith. The bbundary of the curve 
10 may be initially calculated proximate the edges of the image, 
and subsequently interactively reduced. 

Prior to a detailed description of the physical 
embodiment of the invention, the ' snake' signal processing 
techniques mentioned above will now be described in greater 
15 detail. 

Introduced .by Kass et al I Kass m, Witkin A, 
Terpozopoulus d. "Snakes: Active Contour Models", 
International Journal of Computer Vision, 321-331, 1988), 
snakes are a method of attempting to provide some of the post- 
20 processing that our own visual system performs. A snake has 
built into it various properties that are associated with both 
edges and the human visual system (Eg continuity, smoothness 
and to some extent the capability to fill in sections of an 
• edge that have been occluded). 
25 A snake is a continuous curve (possibly closed) that 

attempts to dynamically position itself from a given starting 
position in such a way that it ' clings' to edges in the image. 
The form of snake that will be considered here consists of 
curves that are piecewise polynomial. That is, the curve is 
30 in general constructed from N segments {Xj(s ) , yj(s ) }i = l, . . . , N 
where each of the Xj(s) and y|(s} are polynomials in the 
parameter s. As the parameter s is varied a curve is traced 
out. 

From now on snakes will be referred to as the 
35 parametric curve ii(s ) = (x (s ), y (s ) ) where s is assumed to vary 
between 0 and 1. What properties should an ' edge hugging' 
snake have? 
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The snake must be 'driven' by the image. That is, it 
must be able to detect an edge in the image and align itself 
with the edge. One way of achieving this is to try to 
position the snake such that the average 'edge strength' 
5 (however that may be measured) along the length of the snake 
is maximised. If the measure of edge strength is F(x, y)20 at - 
the image point (x, y) then this amounts ' to saying that the 
snake ii(s) is to be chosen in such a way that the functional 



is maximised. This will ensure that the snake will tend to 
mould itself to edges in the image if it finds them, but does 
not guarantee that it will find them in the first place. 
Given an image, the functional may have many local minima- 
static problem: finding them is where the ' dynamics' arise. 

15 An edge detector applied to an image will tend to 

produce an edge map consisting of mainly thin edges. This 
means that the edge strength function tends to be zero at most 
places in the image, apart from on a few lines. As a 
consequence a snake placed some distance from an edge may not 

20 be attracted towards the edge because the edge strength is 
effectively zero at the snakes initial position. To help the 
snake come under the influence of an edge, the edge image is 
blurred to broaden the width of the edges. 



25 then let go, the band would contract until the object 
prevented it from doing so further. At this point the band 
would be moulded to the object, thus describing the boundary. 
Two forces are at work here. Firstly that providing the 
natural tendency of the band to contract; secondly the 

30 opposing force provided by the object. The band contracts 
because it tries to minimise its elastic energy due to 
stretching. If the band were described by the parametric 
curve ii(s) = (x(s), y(s}) then the elastic energy at any point s 
is proportional to 



. . . (1) 



If an elastic band were held around a convex object and 
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,)■• 






(dy 


ds 




\ds 




[ds 



That is, the energy is proportional to the square of 
how much the curve is being stretched at that point. The 
elastic energy along its entire length, given the constraint 
of the object, is minimised. Hence the elastic band assumes 
the shape of the curve ii(s ) (x (s ) , y (sO ) where the ii(s) 
minimises the functional 



subject to the constraints of the object. We would like 

10 closed snakes to have analogous behaviour. That is, to have 
a tendency to contract, but to be prevented from doing so by 
the objects in an image. To model this behaviour the 
parametric curve for the snake is chosen so that the 
functional (2) tends to be minimised. If in addition the 

15 forcing term (1) were included then the snake would be 
prevented from contracting 'through objects' as it would be 
attracted toward their edges. The attractive force would also 
tend to pull the snake into the hollows of a concave boundary, 
provided that the restoring ' elastic force' was not too great. 

20 One of the properties of the edges that is difficult to 

model is their behaviour when they can no longer be seen. If 
one were looking at a car and a person stood in front of it, 
few would have any difficulty imagining the contours of the 
edge of the car that were occluded. They would be ' smooth' 

25 extensions of the contours either side of the person. If the 
above elastic band approach were adopted it would be found 
that the band formed a straight line where the car was 
occluded (because it tries to minimise energy, and thus length 
in this situation). If however the band had some stiffness 

30 (that is a resistance to bending, as for example displayed by 
a flexible bar) then it would tend to form a smooth curve in 
the occluded region of the image and be tangential to the 
boundaries on either side. 
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Again a flexible bar tends to form a shape so that its 
elastic energy is minimised. The elastic energy in bending is 
dependent on the curvature of the bar, that is the second 
derivatives. To help force the snake to emulate this type of 
behaviour the parametric curve ii(s ) = (x (s ) , y (s ) ) is chosen so 
it tends to minimise the functional 



}ds 



(3) 



which represents a pseudo-bending energy term. Of course, if 
a snake were made too stiff it would be difficult to force it 
10 to conform to highly curved boundaries under the action of the 

forcing term ( 1 ). 

Three desirable properties of snakes have now been 
identified. To incorporate all three into the snake at once 
the parametric curve u (s ) = (x (s ) , y (s ) ) representing the snake 
15 is chosen so that it minimises the functional 



/■ 

s-O 



■m 



-F(x(s\yis))\ds 



(4) 



Here the terms a(s)>0 and P(s)> 0 represent 
respectively the amount of stiffness and elasticity that the 
snake is to have. It is clear that if the snake approach is 
20 to be successful then the correct balance of these parameters 
is crucial. Too much stiffness and the snake will not 
correctly hug the boundaries; too much elasticity and closed 
snakes will be pulled across boundaries and contract to a 
point or may even break away from boundaries at concave 
.25 regions. The negative sign in front of the forcing term is 
because minimising -/F(x,y)ds is equivalent to maximising 
]F(x,y)ds. 

As it stands, minimising the functional (4) is trivial. 
If the snake is not closed then the solution degenerates into 
30 a single point (x (s ) , y (s ) ) =co/3stai3t, where the point is chosen 
to minimise the edge strength F(x(s ), y (s ) ). Physically, this 



-Il- 



ls because the snake will tend to pull its two end points 
together in order to minimise the elastic energy, and thus 
shrink to a single point. The global minimum is attained at 
the point in the image where the edge strength is largest. To 
5 prevent this from occurring it is necessary to fix the 
positions of the ends of the snake in some way. That is, 
' boundary conditions' are required. It turns out to be" 
necessary to fix more than just the location of the end points 
and two further conditions are required^i for a well posed 
10 problem. A convenient condition is to impose zero curvature 
at each end point. 

Similarly, the global minimum for a closed-loop snake 
occurs when it contracts to a single point. However, in 
contrast to an fixed-end snake, additional boundary conditions 
15 cannot be applied to eliminate the degenerate solution. The 
degenerate solution in this case is the true global minimum. 

Clearly the 'ideal situation is to seek a local minimum 
in the locality of the initial position of the snake. In 

practice the problem that is solved is weaker than this: find 

- ' 2 2 

20 a curve ii(s ) = (x (s ) , y (s ) ) € H [0, l]xH [0,1] such that 



a/(4(5)f€^(5)) 



= 0; v(^) € Hl[Q,l] X Hl[OA] ... (5) 



de 

Here H"[0, 1] denotes the class of real valued functions 
defined on [0,1] that have 'finite energy' in the second 
derivatives (that is the integral of the square of the second 

25 derivatives exists [Keller HB. Numerical Methods for Two-Point 
Boundary Value Problems, Blaisdell, 1968 ] and Hq[0, 1] is the 
class of functions in H'[0, 1] that are zero at s=0 and s = l. 
To see how this relates to finding a minimum consider ii(s) to 
be a local minimum and ii(s)+ev.(s) to be a perturbation about 

30 the minimum that satisfies the same boundary conditions (ie 
v(0)=v(l)=0). 

Clearly, considered as a function of e, 
I (e ) =li(s ) +ev.(s ) ) is a minimum at e=0. Hence the derivative 
of 1(e) must be zero at e = 0. Equation (5) is therefore a 
35 necess'ary condition for a local minimum. Although solutions 
to (5) are not guaranteed to be minima for completely general 
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edge strength functions, it has been found in practice that 
solutions are indeed minima. 

Standard arguments in the calculus of variations show 
that problem (5) is equivalent to another problem, which is 
5 simpler to solve: find a curve (x(s),y(s)) e C (0,1 ]xC [0,11 
that satisfies the pair of fourth order ordinary differential, 
equations 



10 



20 



together with the boundary conditions 
i(0),y(0)^(l).y(l) given, and 



. (6) 



(7) 













s>0 




... 



=0 ... (7) 



The statement of the problem is for the case of a 
fixed-end snake, but if the snake is to form a closed loop 
then the boundary conditions above are replaced by periodicity 
15 conditions. Both of these problems can easily be solved using 
finite differences. 

The finite difference approach starts by discretising 
the interval [0,1] into N-1 equispaced subintervals of length 
h=l\(N-l) and defines a set of nodes 



w^ere sj=(i-l)h. 
The method seeks a set of approximations 



by replacing the differential equations (6) and (7) in the 
continuous variables with a set of difference equations in the 
25 discrete variables ( Keller HB. , Ibid.], Replacing the 
derivatives in (6) by difference approximations at the point 
Si gives 
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lift (^M-^f) n (^r^i-l) l ^ 1^ 



dF 
dx 



= 0; for f=3,4...^-2 



(9). 



where a|=a(Sj) and Pj=P(S|). Similarly a difference 

approximation to (7) may be derived. Note that the difference 
equation only holds at internal modes in the interval where 
the indices referenced lie in the range I to N. Collecting 
like terms together, (9) can be written as 

where 



2a, 2a^^^ 



b. i + 



g.-.i , 4a,- ^ g,.i P,.i ^ P, 
A* A" A* A* 



«, = - 



, 1 35 



Olscretlslng both the differential equations (6) and 
10 (7) and taking boundary conditions into account, the finite 
difference approximations ain{X|} and y{y|} to {X|} and {y|))/ 
respectively, satisfy the following system of the algebraic 
equations 

15 K2i=i(a,ir), Ki:=a(ii.z) ...dO) 
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The structure of the matrices K and the right hand 
vectors I and 3 are different depending on whether closed or 
open snake boundary conditions are used. If the snake is 
closed then fictitious nodes at Sq, S.j, S^+i and 8^+2 are 
introduced and the difference equation (9) is applied at nodes 
0, 1, N-1 and N 



Periodicity implies that Xq=x^, ^c.^^Xj^.^, xj^+j=Xi 



and 



becomes 



With these conditions in force the coefficient matrix 



10 





^1 


«1 






















^4 







^N-l ^N-l ^N'\ ^N- 



J 



15 



and the right hand side vector is 

(fp fj/ . . - ^N*'^ 

For fixed-end snakes fictitious nodes at and Sj^+j are 

introduced and the difference equation (9) is applied at nodes 

Si and Sjsj+i. Two extra difference equations are introduced to 

approximate the zero curvature boundary conditions: 



= 0 



namely Xo-2xi+x, = 0 and x.i-2xj^+Xn+i=0. The coefficient matrix 



is now 
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^4 h ^4 ^4 ^4 

^5 ^5 ^5 ^5 «5 

^A/-2 



and the right hand side vector is 
(f2-(2a2+b2)Xi, f3-a3Xj, . . . 
. . . fN-3' ^N-2®N-2^N' ^ N-1" ^ ^ej^.i+d^.i ) Xj^) 
5 The right hand side vector for the difference equations 

corresponding to (7) is derived in a similar fashion. 

The system (10) represents a set of non-linear 
equations that has to be solved. The coefficient matrix is 
symmetric and positive definite, and banded for the fixed-end 
10 snake. For a closed-loop snake with periodic boundary 
conditions it is banded, apart from a few off-diagonal 
entries. As the system is non-linear it is solved 
iteratively. The iteration performed is 

(^«*r^«) = /Oe ,3! ) for n=0,lA... 

n*\ n n 



Y 



= g(S.:t) for /i=0.1,2.... 



where Y>0 is a stabilisation parameter. This can be rewritten 
15 as 
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This system has to be solved for each n. For a closed- 
loop snake the matrix on the left hand side is difficult to 
invert directly because the terms that are outside the main 
diagonal band destroy the band structure. In general, the 
5 coefficient matrix K can be split into the sum of a banded 
matrix B plus a non-banded matrix A; K^A+B. For a fixed-end " 
snake the matrix A would be zero. The system of equations is 
now solved for each n by performing the iteration 



f 5+-i/lx(*V> = 'A^^'\ + ,y. ) for /:=0.U..., 

f fl+I/ = -^j-W * ly .J! ) for *=0,1A... 



The matrix (B + I/Y) is a band matrix and can be 

T 

10 expressed as a product of Cholesky factors LL [ Johnson 
ReisS/ ibid. ]. The systems are solved at each stage by first 
solving 



15 



followed by 



r 7-^a*i),^(it*i) 



Notice that the Cholesky decomposition only has to be 
performed once. 

Model-based coding schemes use 2-D or 3-D models of 
20 scene objects in order to reduce the redundancy in the 
information needed to encode a moving sequence of images. The 
location and tracking of moving objects is of fundamental 
importance of this. Videoconf erence and videophone type 
scenes may present difficulties for conventional machine 
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vision algorithms as there can often be low contrast and 
' fuz2y' moving boundaries between a person' s hair and the 
background. Adaptive contour models or ' snakes' form a class 
of techniques which are able to locate and track object 
5 boundaries; they can fit themselves to low contrast boundaries • 
and can fill in across boundary segments between which there 
is. little . or no local evidence for an edge. This paper 
discusses the use of snakes for isolating the head boundary in 
images as well as a technique which combines block motion 

10 estimation and the snake: the 'power-assisted' snake. 

The snake is a continuous curve (possibly closed) which 
attempts to dynamically position itself from a given starting 
position in such a way that it clings to edges in the image. 
Full details of the implementation for both closed and ' fixed- 

15 end' snakes are given in Waite JB, Welsh WJ, "Head Boundary 
Location using Snakes", British Telecom Technology Journal, 
Vol 8, No 3, July 1990, which describes two alternative 
implementation strategies: finite elements and finite 
differences. We implemented both closed and fixed-end snakes 

20 using finite differences. The snake is initialised around the 
periphery of a head-and-shoulders image and allowed to 
contract under its own internal elastic force. It is also 
acted on by forces derived from the image which are generated 
by first processing the image using a Laplacian-type operator 

25 with a large space constant the output of which is rectified 
and modified using a smooth non-linear function. The 
rectification results in isolating the 'valley' features which 
have been shown to correspond to the subjectively important 
boundaries in facial images; the non-linear function 

30 effectively reduces the weighting of strong edges relative to 
weaker edges in order to give the weaker boundaries a better 
chance to influence the snake. After about 200 iterations of 
the snake it reaches the position hugging the boundary of the 
head. In a second example, a fixed-end snake with its end 

3 5 points at the bottom corners of the image was allowed to 
contract in from the sides and top of the image. The snake 
stabilises on the boundary between hair and background 
although this is a relatively low-contrast boundary in the 
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image. As the snake would face problems trying to contract 
across a patterned background, it might be better to derive 
the image forces from a moving edge detector. 

In Kass et al, ibid., an example is shown of snakes* 
5 being used to track the moving lips of a person. First, the 
snake is stabilised on the lips in the first frame of a moving, 
sequence of images; in the second frame it is initialised in 
the position corresponding to its stable position in the 
previous frame and allowed to achieve equilibrium again. 

10 There is a clear problem with the technique in this form in 
that if the motion is too great between frames, the snake may 
lock on to different features in the next frame and thus lose 
track. Kass suggests a remedy using the principle of 'scale- 
space continuation' : the snake is allowed to stabilise first 

15 on an image which has been smoothed using a Gaussian filter 
with a large space, constant; this has the effect of pulling 
the snake in from a large distance. After equilibrium has 
occurred, the snake is presented with a new set of image 
forces derived by' using a Gaussian with slightly smaller space 

20 constant and the process is continued until equilibrium has 
occurred in the image at the highest level of resolution 
possible. 

This is clearly a computationally expensive process; a 
radically simpler technique has been developed and found to 

25 work well and this will now be described. After the snake has 
reached equilibrium in the first frame of a sequence, block 
motion estimation is carried out at the positions of the snake 
nodes (the snake is conventionally implemented by 
[ approximating it with a set of discrete nodes - 24 in our 

30 implementation). The motion estimation is performed from one 
frame into the next frame which is the opposite sense to that 
conventionally performed during motion compensation for video 
coding. If the best match positions for the blocks are 
plotted in the next frame then, due to the ' aperture problem' , 

35 a good match can often be found at a range of points along a 
boundary segment which is longer than the side of the block 
being matched. The effect is to produce a non-uniform 
spacing of the points. The snake is then initialised in the 
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frame with its nodes at the positions of the best match block 
positions and allowed to run for a few iterations, typically 
ten. The result is the nodes are now uniformly distributed 
along the boundary; the snake has successfully tracked the 
5 boundary, the block motion estimation having acted as a sort 
of 'power-assist' which will ensure tracking is maintained as, 
long as the overall motion is not greater than the maximum 
displacement of the block search. As the motion estimation is 
performed at only a small set of points, €he computation time 

10 is not increased significantly. 

Both fixed-end and closed snakes have been shown to 
perform object boundary location even in situations where they 
may be a low contrast between the object and its background. 

A composite technique using both block motion 

15 estimation and snake fitting has been shown to perform 
boundary tracking . in a sequence of moving images. The 
technique is simpler to implement than an equivalent coarse- 
to-fine resolution technique. The methods described in the 
paper have so fdr been tested in images where the object 

20 boundaries do not have very great or discontinuous curvature 
at any point; if these conditions are not met, the snake 
would fail to conform itself correctly to the boundary 
contour. One solution, currently being pursued, is to 
effectively split the boundaries into a number of shorter 

25 segments and fit these segments with several fixed-end snakes. 

According to a second aspect of the present invention 
a method of verifying the identity of the user of a data 
carrier comprises: generating a digital facial image of the 
user; receiving the data carrier and reading therefrom 

30 identification data; performing the method of the first aspect 
of the present invention; comparing each feature vector, or 
data derived therefrom, with the identification data; -and 
generating a verification signal in dependence upon the 
comparison. 

35 According to a yet further aspect of the present 

invention apparatus for verifying the identity of the user of 
a data carrier comprises: means for generating a digital 
facial image of the user; means for receiving the data carrier 
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and reading therefrom identification data; means for 
performing the method of the first aspect of the present 
invention, and means for comparing each feature vector, or 
data derived therefrom, with the identification data and 
5 generating a verification signal in dependence upon the 
comparison. 

An embodiment of the invention will now be described, 
by way of example only, with reference to the accompanying 
drawings in which: 
10 Figure 1 is flow diagram of the calculation of a 

feature vector; 

Figure 2 illustrates apparatus for credit verification 
using the method of the present invention; and 

Figure 3 illustrates the method of the present 

15 invention. 

Referring to Figure 1, an overview of an embodiment 
incorporating both aspects of the invention will be described. 

An image of the human face is captured in some manner 
- for example by using a video camera or by a photograph - and 
2(K digitised to provide an array of pixel values. A head 
; detection algorithm is employed to locate the position within 
the array of the face or head. This head location stage may 
comprise one of several known methods but is preferably a 
' method using the above described "snake" techniques. Pixel 
25 data lying outside the boundaries thus determined are ignored. 

The second step is carried out on the pixel data lying 
inside the boundaries to locate the features to be used for 
recognition - typically the eyes and mouth. Again, several 
location techniques for finding the position of the eyes and 
30 mouth are known from the prior art, but preferably a two stage 
process of coarse location followed by fine location is 
employed. The coarse location technique might, for example, 
be that described in US 4841575. 

The fine location technique preferably uses the 
35 deformable template technique described by Yuille et al 
"Feature Extraction from Faces using Deformable Templates", 
Harvard Robotics Lab Technical Report Number; 88/2 published 
in Computer Vision and Pattern Recognition, June 1989 IEEE. 
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In this technique, which has been described above, a line 
model topologically equivalent to the feature is positioned 



iteratively moved and deformed until the best fit is obtained. 
5 The feature is identified as being at this position. 

Next, the shape of the feature is changed until it-, 
assumes a standard, topologically equivalent, shape. If the 
fine location technique utilised deformable templates as 
disclosed above, then the deformation of the feature can be 

10 achieved to some extent by reversing the deformation of the 
template to match the feature to the initial, standard, shape 
of the template. 

Since the exact position of the feature is now known, 
and its' exact shape is specified, recognition using this 

15 information can be employed as identifiers of the image 
supplementary to the recognition process using the feature 
vector of the feature. All image data outside the region 
identified as being the feature are ignored and the image data 
identified as being the feature are resolved into its 

20 orthogonal ei gen-picture components corresponding to that 
feature. The component vector is then compared with a 
component vector corresponding to a given person to be 
identified and, in the event of substantial similarity, 
recognition may be indicated. 

25 Referring to Figure 2, an embodiment of the invention 

suitable for credit card verification will now be described. 

A video camera 1 receives an image of a prospective 
user of a credit card terminal. Upon entry of the card to a 
card entry device 2, the analogue output of the video camera 

30 1 is digitised by an AD converter 3, and sequentially clocked 
into a frames tore 4. A video processor 5 (for example, a 
suitable processed digital signal processing chip such as that 
AT&T DSP 20) is connected to access the frames tore 4 and 
processes the digital image therein to form and edge-enhanced 

35 image. One method of doing this is simply to subtract each 
sample from its predecessor to form a difference picture, but 
a better method involves the use of a Laplacian type of 
operator, the output of which is modified by a sigmoidal 



(by the coarse location technique) near the feature and is 




function which suppresses small levels of activity due to 
noise as well as very strong edges whilst leaving intermediate 
values barely changed. By this means, a smoother edge image 
is generated, and weak edge contours such as those around the 
5 line of the chin are enhanced. This edge picture is stored in 
an edge picture frame buffer 6. The processor then executes 
a closed loop snake method, using finite differences, to 
derive a boundary which encompasses the head. Once the snake 
algorithm .has converged, the position of th^ boundaries of the 

10 head in the edge image and hence the corresponding image in 
the frame store 4 is now in force. 

The edge image in the framestore 6 is then processed to 
derive a coarse approximation to the location of the features 
of interest - typically the eyes and the mouth. The method of 

15 Nagao is one suitable technique (Nagoa M, "Picture Recognition 
and Data Structure", Graphic Languages - Ed. Rosenfield) as 
described in our earlier application EP0225729. The estimates 
of position thus derived are used as starting positions for 
the dynamic template process which establishes the exact 

20 feature position. 

Accordingly, processor 5 employs the method described 
in Yuille et al (Yuille A, Cohen D, Hallinan P, '(1988), 
"Facial Feature Extraction by Deformable Templates", Harvard 
Robotics Lab. Technical Report no. 88-2) to derive position 

25 data for each feature which consists of a size (or resolution) 
and a series of point coordinates given as fractions of the 
total size of the template. Certain of these points are 
designated as keypoints which are always internal to the 
template, the other points always being edge points. These 

30 key point position data are stored, and may also be used as 
recognition indicia. This is indicated in Figure 3. 

Next, the geometrical transformation of the feature" to 
the standard shape is performed by the processor 5. This 
transformation takes the form of a mapping between triangular 

35 facets of the regions and the templates. The facets consist 
o^f local collections of three points and are defined in the 
template definition files. The mapping is formed by 
considering the x, y values of template vertices with each of 
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the x',y' values of the corresponding region vertices - this 
yields two plane equations from which each ,y' point can be 
calculated given any y within the template facet, and thus 
the image data can be mapped from the region sub-image. 
5 The entire template sub-image is obtained by rendering 

(or scan-converting) each constituent facet pel by pel, taking 
the pel' s value from the corresponding mapped location in the 
equivalent region' s sub-image. ^ 

The processor 5 is arranged to perform the mapping of 

10 the extracted region sub-images to their corresponding generic 
template size and shape. The keypoints on the regions form a 
triangular mesh with a corresponding mesh defined for the 
generic template shape; mappings are then formed from each 
triangle in the generic mesh to its equivalent in the region 

15 mesh. The distorted sub-images are then created and stored in 
the template data 'structures for later display. 

The central procedure in this module is the ' template 
stretching' procedure. This routine creates each distorted 
template sub-image facet by facet (each facet is defined by 

20 three connected template points). A mapping is obtained from 
each template facet to the corresponding region facet and then 
the template facet is filled in pel by pel with image data 
mapped from the region sub-image. After all facets have been 
processed in this way the distorted template sub-image will 

25 have been completely filled in with image data. 

The standardised feature image thus produced is then 
stored in a feature image buffer 7. An eigen-picture buffer 
8 which contains a plurality (for example 50) of eigen- 
pictures of increasing sequency which have previously been 

30 derived in known manner from a representative population 
(preferably using an equivalent geometric normalisation 
technique to that disclosed above). A transform processor 9 
(which may in practice be realised as processor 5 acting under 
suitable stored instructions) derives the co-ordinates or 

35 components of the feature image with regard to each eigen- 
picture, to give a vector of 50 numbers, using the method 
described above. The card entry device 2 reads from the 
inserted credit card the 50 components which characterise the 




correct user of that card, which are input to a comparator 10 
(which may again in practice be realised as part of a single 
processing device) which measures the distance in pattern 
space between the two connectors. The preferred metric is the 
5 Euclidian distance, although other distance metrics (eg. "a 
city block" metric) could equally be used. If this distance 
is less than a predetermined threshold, correct recognition is 
indicated to an output 11; otherwise, recognition failure, is 
signalled. 

10 Other data may also be incorporated into the 

recognition process; for example, data derived during template 
deformation, or head measurements (e. g. the ratio of head 
height to head width derived during the head location stage) 
or the feature position data as mentioned above. Recognition 
15 results may be combined in the manner indicated in our earlier 
application G39005190. 5. 

Generally, some preprocessing of the image is provided 
(indicated schematically as 12 in Figure 2); for example, 
noise filtering ''{spatial or temporal) and brightness or 
20 contrast ndrmalisation. 

Variations in lighting can produce a spatially variant 
effect on the image brightness due to shadowing by the brows 
etc. It may be desirable to further pre-process the images to 
remove most of this variation by using a second derivative 
25 operator or morphological filter in place of the raw image 
data currently used. A blurring filter would probably also be 
required. 

It might also be desirable to reduce the effects of 
variations in geometric normalisation on the representation 
30 vectors. This could be accomplished by using low-pass 
filtered images throughout which should give more stable 
representations for recognition purposes. 
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1. A method of processing an image comprising the steps 

of: 

locating within the image the position of at least one 
5 predetermined feature; 

extracting image data from said image representing each 
said feature; and ^ 

calculating for each feature a feature vector 
representing the position of the image data of the feature in 
10 an N-dimensional space, said space being defined by a 
plurality of reference vectors each of which is an eigenvector 
of a training set of images of like features; 

characterised in that the method further comprises the 
step of: 

15 modifying the image data of each feature to normalise 

the shape of each feature thereby to reduce its deviation from 
a predetermined standard shape of said feature, which step is 
carried out before calculating the corresponding feature 
vector. 

20 2. A method according to claim 1, wherein the step of 

modifying the image data of each feature involves the use of 
a deformable template technique. 

3. A method according to claim 1 or 2, wherein the step of 
locating within the image the position of at least one 

25 predetermined feature employs a first technique to provide a 
coarse estimation of position and a second, different, 
technique to improve upon the coarse estimation. 

4. A method according to claim 3 wherein the second 
technique involves the use of a deformable template technique. 

30 5. A method according to any proceeding claim in which the 

training set of of images of like features are modified to 
normalise the shape of each of the training set of images 
thereby to reduce their deviation from a predetermined 
standard shape of said feature, which step is carried out 

35 before . calculating the eigenvectors of the training set of. 
images. 



wo yz/u^ouu 



- 26 - 



6. A method according to any preceding claim comprising 
locating a portion of the image by determining parameters of 
a closed curve arranged to lie adjacent a plurality of edge 
features of the image, said curve being constrained to exceed 

5 a minimum curvature and to have a minimum length compatible 
therewith. 

7, A method as claimed in claim 6 in which the boundary of 
the curve is initially calculated proximate the edges of the 
image, and subsequently interactively reduced. 

10 8. A method according to either one • of claims 6 and 7 in 

which the portion of the image is a face or a head of a 
person. 

9. A method according to any preceding claim further 
comprising determining the position of each feature within the 

15 image. 

10. A method of verifying the identity of the user of a 
data carrier comprising: 

generating a digital facial image of the user; 
receiving ^the data carrier and reading therefrom 
20 identification data; 

performing the method of any one of claims 1 to 8; 
comparing each feature vector, or data derived 
therefrom, with the identification data; and 

generating a verification signal in dependence upon 
25 the comparison. 

11. Apparatus for verifying the identity of the user of a 
data carrier comprising: 

means for generating a digital facial image of the 

user; 

30 means for receiving the data carrier and reading 

therefrom identification data; 

means for performing the method of any one of claims 1 
to 8; and 

means for comparing each feature vector, or data 
3 5 derived therefrom, with the identification data and generating 
a verification signal in dependence upon the comparison. 

12. Apparatus as claimed in claim 11 in which the means for 
generating a digital facial image of the user comprises a 
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video camera the output of which is connected to an 
convertor. 
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